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Abstract 

A rateless coding scheme transmits incrementally more and more 
coded bits over an unknown channel until all the information bits 
are decoded reliably by the receiver. We propose a new rateless cod¬ 
ing scheme based on polar codes, and we show that this scheme is 
capacity-achieving, i.e. its information rate is as good as the best 
code specifically designed for the unknown channel. Previous rateless 
coding schemes are designed for specific classes of channels such as 
AWGN channels, binary erasure channels, etc. but the proposed rate¬ 
less coding scheme is capacity-achieving for broad classes of channels 
as long as they are ordered via degradation. Moreover, it inherits the 
conceptual and computational simplicity of polar codes. 


1 Introduction 

In many communication scenarios, the quality of the communication channel 
is unknown to the transmitter. One possibility is to design a fixed-rate code 
for the worst possible channel but this often leads to an overly conservative 
solution. Another possibility is to use a rateless code , which transmits an 
increasing number of coded bits until all the information bits can be decoded 
reliably by the receiver. This solution requires the receiver to give simple 
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ACK/NACK feedback to the transmitter but this capability is available in 
many communication scenarios. For example, rateless codes, appearing by 
the name of hybrid ARQ or incremental redundancy schemes, are an essential 
part of a reliable and efficient wireless communication systems. As another 
example, rateless codes are very useful for packet erasure networks where the 
erasure rate is unknown. 

A fixed-rate code designed for a specific channel is judged by its perfor¬ 
mance on that channel. In contrast, a rateless code is designed for a class of 
channels describing the channel uncertainty, and it is judged by its perfor¬ 
mance on all the channels in the class. A rateless coding scheme is said to 
be capacity-achieving over a class of channels if for each channel in that class 
the number of coded bits transmitted by the scheme until reliable decoding is 
no more than the number of coded bits a capacity-achieving code specifically 
designed for that channel needs to transmit. Elementary information theo¬ 
retic considerations show that random codes are capacity-achieving rateless 
codes for any class of channels which share the same capacity-achieving opti¬ 
mal distribution. A more interesting problem is the explicit construction of 
capacity-achieving rateless codes which have efficient encoding and decoding. 
Two classes of such codes have been constructed. First are the the LT codes 
of Luby and the closely related Raptor codes of Shokrollahi |5j . They 
are specifically designed for packet erasure channels. The second example is 
the rateless codes designed for AWGN channels by Erez, Trott and Wornell 
[8j. These rateless codes are built using fixed-rate capacity-achieving AWGN 
codes as base codes. 

In this paper, we propose a new rateless coding scheme based on polar 
codes. Polar codes are the first class of low-complexity codes that are shown 
to achieve the capacity of a wide range of channels [T] . By leveraging this key 
property of polar codes, the rateless coding scheme we designed is shown to 
be capacity-achieving for general classes of channels totally ordered by degra¬ 
dation. This is in contrast to the above two classes of rateless codes, each 
of which is designed for a specific such class (erasure channels and AWGN 
channels respectively). 

One approach to designing a rateless coding scheme, used in turbo and 
LDPC based hybrid ARQ schemes, is puncturing: 1) a ” mother” code with 
very low coding rate is first designed; 2) the mother code is significantly 
punctured and the remaining coded bits sent at the first transmission; 3) 
non-punctured bits are incrementally sent at later transmissions. But for 
polar codes, it is a problem to design a rateless scheme in this fashion. Due 
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to the highly structured nature of polar codes, it is unclear how to to puncture 
a ” mother” polar code with a low coding rate and maintain this punctured 
code as a ’’good” code. Nor is it clear how to incrementally add coding bits 
to a polar code with a very high coding rate and maintain the final low-rate 
code as a ’’good” polar code. 

However, there is actually a very natural way to build a rateless scheme 
based on polar codes. Recall that a fixed-rate polar code is constructed 
by applying a linear transformation recursively to convert the underlying 
channel into a set of noiseless channels and a set of completely noisy channels 
under successive decoding. Information bits are supposed to be transmitted 
on the noiseless channels while known (frozen) bits are transmitted on the 
completely noisy channels. If the channel were known at the transmitter, 
then the transmitter knows exactly which are the noiseless channels and 
which are the completely noisy channel and this scheme can be implemented. 
If the channel is unknown, then the transmitter does not know which channels 
are noiseless and which are completely noisy, but what it does know is a 
reliability ordering of the channels, such that regardless of what underlying 
channel is, a more reliable channel is always noiseless if a less reliable channel 
is noiseless, and a less reliable channel is always completely noisy if a more 
reliable channel is completely noisy. 

Given this reliability ordering, a rateless scheme can be designed as fol¬ 
lows. The initial transmission can be done aggressively using a high-rate 
polar code with many information bits and very few frozen bits. If this 
transmission cannot be decoded, then too many information bits are sent 
and too few bits are frozen. Among the information bits sent on the first 
transmission, the ones sent on the less reliable channels are retransmitted in 
future transmissions. By decoding these bits from the future transmissions, 
they effectively become frozen, allowing the rest of the information bits sent 
on the first transmission to be decoded. Thus, this scheme can be called in¬ 
cremental freezing, as future transmissions successively freeze more and more 
information bits sent in earlier transmissions. 

In section [2j we present more details of the scheme and show that it is 
capacity-achieving. In section [3j we present a finite blocklength design with 
soft combining decoders and provide some simulation results. Finally we 
draw some conclusions. 
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2 Rateless Polar Codes 


2.1 Polar Codes Basics 

Given a binary-input channel W with arbitrary output alphabet y, the first 
step of the standard polarization process creates two binary input channels: 

W~(yi,y 2 \x) ■= ^ W (y^u + x)W (y 2 \u) 

u£{ 0 , 1 } 

W + (yi,y 2 ,u\x) := (yi\u + x)W (y 2 \x) 

This creates the lst-level channels. The channels at the n+lth level recursion 
can be constructed from the nth level channels. Given any length n sequence 
s of +’s and —’s, define the channels: 

: = (bb s )- 
W s+ : = (1T S )+ 

The theory of polarization [I] shows that as n —» oo, a subset «S(W) of the 
2 n n-level channels will have their mutual information converging to 1, and 
the rest with mutual information converging to zero. By sending information 
bits on the former subset and ” freezing” the latter subset with known bits, 
capacity can be achieved with a successive cancelation decoder. In the sequel, 
we will call S(W) the good bit indices. Note that this set depends on the 
original channel W. 

2.2 Degradedness and Nesting Property of Polar Codes 

A symmetric binary input channel W 2 is said to be degraded with respect to a 
channel W\ if there exists random variables X, Y, Z such that X — Y—Z forms 
a Markov chain and the conditional distribution of Y given A" is W\ and the 
conditional distribution of Z given X is W 2 . We will use the notation W 2 A 
W\ and W\ A W 2 . For example, an AWGN channel of lower SNR is degraded 
with respect to an AWGN channel of higher SNR. An erasure channel of 
higher erasure probability is degraded with respect to an erasure channel of 
lower erasure probability, A BSC of higher crossover probability (less than 
1/2) is degraded with respect to a BSC of lower crossover probability. 
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It is known that the polarization operation preserves degradedness |3j, 
i.e. if W 2 A W\, then hh 2 + A W* and W 2 A Wf. Following the recursion, 
this implies that VFf A VFf for any s. This implies that in the limit of 
the polarization process, the good bit indices S(W 2 ) for W 2 (i.e. those with 
mutual information equal to 1) must be a subset of the good bit indices 
d>(fFi) for W\. We will call this the nesting property. This nesting property 
leads to the reliability ordering of the polarized channels mentioned in the 
introoduction. 

2.3 A Rateless Scheme: Basic Version 

We now present a capacity-achieving rateless scheme built on polar codes. 
We assume that communication is to take place over a class C in which the 
channels have binary-input and are symmetric, totally ordered via degra¬ 
dation and have capacities spanning a continuum from 0 to R , where R is 
called the peak-rate. We will show that the scheme is capacity-achieving in 
the following sense: For any integer k > 1, if the capacity of the channel is 
between R/(k + 1) and R/k, then the scheme can achieve a rate of R/{k + 1) 
reliably. While the scheme is not truly rateless in the sense of achieving any 
arbitrary rate, a small modification of the scheme will make it rateless. This 
will be described in subsection 12.61 

Consider a capacity-achieving polar code of rate R and (long) block length 
N designed for a channel W\ whose capacity is Let 5(Wi) be the good 
bit indices. Note that |<S(Wi)| = NR. At the first stage, we transmit all 
NR information bits on S(Wi) and the rest of the bits are frozen. If the 
unknown channel W is such that W >z IFi, then the receiver can decode 
after this transmission and we are done. 

If the unknown channel W is weaker than W \, then the receiver can¬ 
not decode after the first transmission and the sender performs a second 
transmission. Let W 2 be the channel in C whose capacity is R/2. In the 
second transmission, we retransmit the information bits that were put on 
<S(Wi) — S(W 2 ) in the first transmission using the same polar code but with 
these information bits now put on S(W 2 ) and the rest of the bits frozen. 

1 While strictly speaking capacity-achieving is a property of a sequence of codes of 
increasing block length, here we keep the language lightweight and discusses concepts in 
terms of a code of fixed and long block length. Suitable limiting arguments can be made 
for a precise statement of the results. 
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Note that by the nesting property, S(W 2 ) C iS(kUi) so 
|<S(Wi) - S{W 2 ) | = NR /2 = |S(W 2 )| 

and hence these bits can fit on S(W- 2 ) on the second transmission. If W >z W 2 , 
then the receiver can decode these bits based on the second transmission only. 
Using these bits as side information, the receiver now goes back to the first 
transmission and now the bits in S(Wi) — S(W 2 ) becomes frozen bits and 
only the bits in S{W 2 ) need to be decoded. Since W >z W 2 , these bits can 
be decoded as well based on the first transmission, and we are done. 

If the unknown channel W is weaker than IU2, then the receiver cannot 
decode after the first transmission and the sender performs a third transmis¬ 
sion. Let IU3 be the channel in C whose capacity is Rj 3 . If the unknown 
channel were IT 3, then the bits sent on S(W 2 ) — tS(W3) in both first and sec¬ 
ond transmissions should have been frozen, but they were not. So the sender 
re-transmit these information bits in the third transmission. The number of 
such information bits are 


2 (NR /2 - NR/ 3 ) = NR/ 3 , 


so they can all be transmitted on the ^(IUs) indices in the third transmission 
(with the rest of the bits frozen). If the unknown channel W is equal or 
stronger than IU3, then all these bits can be correctly decoded from the third 
transmission. Among these bits, the ones sent on the second transmission 
are now side information to be used to decode all the information bits sent 
on the second transmission. These latter bits, together with the bits that 
are decoded from the bits that are decoded from the third transmission and 
are repeated directly from the first transmission, become side information 
to freeze the bits sent on S(W\) — <S(W3) indices in the first transmission, 
enabling the bits sent on 5 (IUi) in the first transmission to be decoded as 
well. 

In general, suppose after k transmissions, decoding has failed. Now we 
shoot for a channel Wk+\ of rate We retransmit all the information bits 
sent on S{Wk) — S{Wk+i) in all the previous k transmissions. There are a 
total of 


NR NR\ _ NR 
k k + l) k + 1 


such bits, and they can all be sent on the <S(U4 + i) indices in the k + 1th 
transmission. Using the backward decoding strategy described above, we can 
now go back and decode everything. 
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2.4 An Example 

Figure [l] gives a simple example with N — 16, K — 12 for the peak-rate 
code, i.e. we shoot for a maximum rate R = 3/4. 

In the initial transmission, the 12 information bits U 2 , ■ ■ ■ , W 12 are sent 
on the 12 channels with largest reliabilities, the initial rate Ri = R = 3/4. 
Other bits are frozen. 

When the 1st transmission fails, the second half of the information bits, 
117 , 11 %, ■ ■ ■ ,u \2 are sent on the 6 most reliable channels of the 2nd transmis¬ 
sion; the rate for the first and second transmissions combined is i? 2 = 6/16 = 
R/2. 

When the 2nd transmission also fails, U 5 ,Uq from the first transmission 
and Uu,Ui 2 from the second transmission are sent on the best 4 channels of 
the 3nd transmission; the rate for the first three transmissions combined is 
R 3 = 3/16 = R/3. 

Finally, when the 3rd transmission still fails, U 4 , W 10 , ^12 are sent on the 
best 3 channels for the 4th transmission; the rate for the first four transmis¬ 
sions combined is R 4 = 3/16 = Rf 4. 

2.5 Incremental Freezing 

We can think of the above scheme as incremental freezing of information bits 
in a polar code. If we knew the channel, then we would freeze exactly the 
right bits. However, since we don’t know what the channel is, we should be 
more aggressive and freeze few bits and send many information bits. If we 


Most Reliable Mapping to Sub-Channels Least Reliable 



Figure 1: A simple example of incremental freezing with N = 16, K = 12, 
and up to 4 transmissions. 
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are lucky and the channel is good, then all the information bits get through. 
On the other hand, if the channel is not as good as we hope, then we can 
retroactively freeze more bits by retransmitting and decoding these bits in 
future transmissions. Because of the nesting property, we know exactly what 
bits we should retroactively freeze. And we effectively freeze more and more 
bits incrementally as the the actual channel is worse and worse than expected. 

The key to why the scheme is capacity-achieving is indeed the nesting 
property. At each transmission, we don’t know what the unknown channel 
W is, but we are always assured that the good bit indices for the unknown 
channel is a subset of the information bits used in that transmission. So we 
never ’’waste” any mutual information in any transmission. We may not be 
able to decode the bits because too few bits are frozen, but this can always 
be rectified by retroactively freezing. 

2.6 Extension to Arbitrary Rates 

The above scheme is not truly rateless as it can only achieve the rates 
R,R/2, R/ 3,..., rather than a set of arbitrary rates. A simple extension of 
the scheme will rectify this issue. The idea is that in future transmission, new 
information bits can be transmitted in combination with bits retransmitted 
from previous transmissions. For example, suppose we want to achieve rates 
R on the first transmission, and R — A bits on the first and second transmis¬ 
sion combined, where A > 0 is arbitrary. The first transmission is exactly 
the same as before. In the second transmission, instead of re-transmitting 
the information bits sent on the least NR/2 reliable positions in the first 
transmission, one should retransmit only the bits sent on the least N A reli¬ 
able positions in the first transmission and add N(R — 2A) new information 
bits to send a total of NR bits in the most reliable positions of the second 
transmission. 

2.7 Extension to Parallel Channels 

It is known that polarization holds for much more general channels than 
symmetric binary-input channels |9j. So presumably one can extend the 
theory of rateless polar coding to more general classes of channels beyond 
binary symmetric ones. Here we purse one such generalization: parallel 
channels. 


A parallel channel W is composed of Q independent component sub¬ 
channels W^\ ... Here, we focus on parallel channels whose compo¬ 

nent channels have binary inputs and are symmetric. We are interested in 
such parallel channels because they model AWGN channels with 2^- PAM 
input. Using techniques like bit interleaved coded modulation (BICM), the 
AWGN channel with 2 r --PAM input can be modeled as a parallel channel 
with Q binary input AWGN channels. 

A parallel channel W is degraded with respect to a channel V if each of the 
components of W is degraded with respect to the corresponding component 
of V. Let C be a class of parallel channels which are totally ordered via 
degradation. This can for example model a class of AWGN channels with 
2^-PAM input, where all the component sub-channels depend on the (single) 
SNR of the AWGN channel. We now show that our rateless scheme extends 
naturally to this class of channels. 

Consider a parallel channel W\ with capacity A; we decompose it into 0 
parallel sub-channels with rates: Ru, R 32 , • • • , Riq, where R = Rn + R V2 + 
■ • • + Riq. For the first transmission, we design Q polar codes of block length 
N for the Q parallel sub-channels and rates: Ru, R\ 2 , • • • , Riq, respectively. 
If the unknown channel W is such that W >z I-Ui, then the receiver can decode 
after this transmission and we are done. 

If the unknown channel W is weaker than W\, then the receiver cannot 
decode after the first transmission and the sender performs a second trans¬ 
mission. Let the capacity of W 2 be R/2 and the capacity of the Q parallel 
channels be R 2 1 , R 22 , ■ ■ ■ , R 2 q, where R /2 = i? 2 i + R 22 + • • • + R 2 Q■ In the 
second transmission, we retransmit NR/2 information bits. These bits are 
NRu — NR 2 i /2 bits from the information bits of the polar code used for the 
first sub-channel, AhR 12 — NR 22 /2 bits from the information bits of the polar 
code used for the second sub-channel,- • •, and NRiq — NR 2 q /2 bits from 
the information bits of the polar code of the Q th sub-channel. These bits 
are then distributed into the Q Polar codes with rates: R 21 , R 22 ,"' , R 2 Q, 
respectively. If W >z W 2 , then the receiver can decode these bits based on the 
second transmission only. Using these bits as side information, the receiver 
now goes back to the first transmission. Since W W 2 , the first transmission 
can be decoded as well. 

If the unknown channel W is weaker than W 2 , then the receiver cannot 
decode after the first transmission and the sender performs a third transmis¬ 
sion. Let the capacity of W 2 be R/3 and the capaeitiesof the Q sub-channels 
be -R 31 , i? 3 2 , • • • , R 3 q, where R/3 = R 31 + R 32 + • • • + R 3 q. In the third 
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transmission, we retransmit NR/3 information bits. These bits come from 
the information bits of Q polar codes used in the first transmission and from 
the Q Polar codes used in the second transmission. More precisely, there are 
NR 21 — NR 31/2 bits from the information bits of the two Polar codes for the 
first and second transmission for the first parallel channel, NR 22 — NR 32 /2 
bits from the information bits of the two Polar codes for the first and second 
transmission for the second parallel channel,..., NR 2 q — NR^q/2 bits from 
the information bits of the two Polar codes for the first and second trans¬ 
mission for the Qth parallel channel. These bits are then distributed into Q 
Polar codes with rates: -R 31 , -R 32 , • • • , R 3 Q, respectively. 

In general, the kth transmission sends NR/k information bits. They are 
collected from NR/k/{k — 1) information bits sent at each previous trans¬ 
mission. To decode the mth transmission (1 < m < k), the receiver uses 
the side information from (m + 1)—th, (m + 2)-th, • • •, kth decoded data to 
freeze 


RN 


f - - - 

\m(m + 1) 


1 

(m + l)(m + 2) 



+ 


(k - l)k 


information bits and only need to decode 


™-rn(L 

m \m 



= RN 


1 

m 


information bits. Note that with the side information, all transmissions have 
the same rate R/k after k transmissions. 


3 Simulation Results 

We performed some simulations over binary-input AWGN channels to assess 
the finite block length performance of the rateless polar coding scheme we 
proposed. 

In the scheme proposed, we assume that retransmitted bits are decoded 
based only on the received signal of the retransmission. This is sufficient 
to achieve capacity. However, reception from the previous transmissions of 
the bits still provide some information, and finite block length performance 
can be improved by using a soft decoder that combines the reception across 


10 



multiple transmissions. We have designed such a soft decoder. An example 
of how it operates can be seen in Figure [2j 

Figure [3] shows the performance of the soft-combining SC decoder. The 
peak-rate polar code is (2048.1024), yielding a peak rate of 1/2. We evalu¬ 
ate the frame error rate after the second transmission, i.e. effective rate is 
1/4. The blue curve shows the performance of the scheme. In the scheme, 
the number of bits from the first transmission retransmitted in the second 
transmission is exactly the same as the number of bits not retransmitted. 
But actually the retransmitted bits get slightly better treatment since they 
can be decoded by combining receptions from both transmissions. To take 
advantage of this, a few more bits can be retransmitted. By optimizing this 
number, a small improvement can be obtained. This is shown by the green 
curve, where 10 more bits are retransmitted. This optimized performance 
essentially matches that of a block length 2048, rate 1/4 code (red curve), 
but is about 0.3 dB from a block length 4092 polar code of rate 1/4 designed 


First decoded Bit Indices (SC Decoding Order) Last decoded 

-► 

Z 1.000 1.000 0.999 0.941 0.997 0.S97 0.S34 0.352 0.988 0.790 0.691 0.197 0.545 0.106 0.063 0.001 

Rank 16 15 14 11 13 10 9 5 12 8 7 4 6 3 2 1 



V] / V 2 / v 3 / v 5 / v 6 / v 7 


Figure 2: Upper figure: Transmission. Note that unlike Figure [TJ the chan¬ 
nels here are ordered by the successive cancelation order rather than by reli¬ 
ability order. Lower figure: Soft decoding. Bits like V\ that are transmitted 
twice are estimated by combining the LLR’s from both transmissions. 
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for that signal-to-noise ratio. Note that since the effective block length of 



Figure 3: SC performance. 



Figure 4: List Decoder performance. 
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the rateless scheme after two transmissions is 4092, we see that there is still 
a gap, albeit small, with a code optimized for that SNR. This is the price of 
being rateless. 

Figure [4] shows the performance using a soft-combining adaptive SC-list 
decoder with CRC [21 0] • We can see that an optimized rateless scheme is 
about 0.25 dB from a polar code with block length 4096 

4 Conclusion 

In this paper, we propose a capacity-achieving rateless scheme based on polar 
codes. At the first transmission, a peak-rate polar code is used, and at the 
later transmissions, information bits are retransmitted and decoded, and 
hence incrementally freezed to allow for decoding at the first transmission. 
The conceptual simplicity of the scheme attests to the inherent flexibility of 
polar codes. 
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